Statistically Sound Exploratory Rule Discovery
نویسنده
چکیده
Association rule discovery and other exploratory rule discovery techniques explore large search spaces of potential rules to find those that appear interesting by some user-selected criterion of interestingness. Due to the large number of rules considered, they suffer from an extreme risk of type-1 error, finding rules that appear to satisfy the interestingness criteria on the sample data only due to chance. One method for avoiding this risk is to apply a correction for multiple comparisons during the rule discovery process. While this may result in statistically sound rule discovery with tight control over the risk of type-1 error, it introduces extreme risk of type-2 error, rejecting rules that do in fact satisfy the interestingness criteria. This paper proposes a technique to overcome this problem by using holdout data for statistical evaluation. Experiments demonstrate that traditional association rule discovery can result in large numbers of rules that are rejected when subjected to statistical evaluation on holdout data. They also reveal that modification of the rule discovery process to anticipate subsequent statistical evaluation can increase the number of rules that satisfy an interestingness criterion that are accepted by statistical evaluation on holdout data.
منابع مشابه
Preliminary investigations into statistically valid exploratory rule discovery
Exploratory rule discovery, as exemplified by association rule discovery, is has proven very popular. In this paper I investigate issues surrounding the statistical validity of rules found using this approach and methods that might be employed to deliver statistically sound exploratory rule discovery.
متن کاملPruning Derivative Partial Rules During Impact Rule Discovery
Because exploratory rule discovery works with data that is only a sample of the phenomena to be investigated, some resulting rules may appear interesting only by chance. Techniques are developed for automatically discarding statistically insignificant exploratory rules that cannot survive a hypothesis with regard to its ancestors. We call such insignificant rules derivative extended rules. In t...
متن کاملK-Optimal Pattern Discovery: An Efficient and Effective Approach to Exploratory Data Mining
Most data-mining techniques seek a single model that optimizes an objective function with respect to the data. In many real-world applications several models will equally optimize this function. However, they may not all equally satisfy a user's preferences, which will be affected by background knowledge and pragmatic considerations that are infeasible to quantify into an objective function. Th...
متن کاملEfficiently Identifying Exploratory Rules' Significance
How to efficiently discard potentially uninteresting rules in exploratory rule discovery is one of the important research foci in data mining. Many researchers have presented algorithms to automatically remove potentially uninteresting rules utilizing background knowledge and user-specified constraints. Identifying the significance of exploratory rules using a significance test is desirable for...
متن کاملInteractive Mining of Correlations - A Constraint Perspective
The problem of nding minimal sets of objects, that are correlated and have statistically signiicant number of occurrences in a database, has recently received considerable attention, as an alternative to association rules. A case for making mining human-centered, interactive, and exploratory via application speciic constraints was recently made in the papers 14, 12]. The technical results of th...
متن کامل